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Abstract We present how entropy estimates and logarithmic Sobolev inequalities on 
the one hand, and the notion of quasi-stationary distribution on the other hand, are 
useful tools to analyze metastable overdamped Langevin dynamics, in particular to 

! ! quantify the degree of metastability. We discuss the interest of these approaches to 

estimate the efficiency of some classical algorithms used to speed up the sampling, 

v-^- and to evaluate the error introduced by some coarse-graining procedures. This pa- 

per is a summary of a plenary talk given by the author at the ENUMATH 201 1 

i-G conference. 

a 

1 Introduction and motivation 

> 

The aim of this paper is to present two mathematical viewpoints on metastability. 
£ — Roughly speaking, a dynamics is said to be metastable if it spends a lot of time in 

I" a region (called a metastable state) before hopping to another region. To be more 

specific, we will focus in the following on the overdamped Langevin dynamics: 

O dX, = -VV(X,)dt + JlB^dWt (1) 

(N 

which is used for example in molecular dynamics to describe the evolution of a 
molecular system. In this context, the configuration of the system X, £ M." is the 
J^j coordinates of the particles (think of the atoms of a large molecule), V : MP —> R is 

the potential energy, which to a configuration* e R" associates its energy V(x), and 
j3 _1 = kgT is proportional to the temperature (Icb being the Boltzmann constant). 
The stochastic process W t is a standard n-dimensional Brownian motion. For such 
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a dynamics, metastability typically originates from two mechanisms. First, in the 
small temperature regime, the dynamics ([TJ can be seen as a perturbation of the 
simple gradient dynamics y = — VV(y) for which, from any initial condition, the 
solution converges to a local minimum of V. Having this in mind, the dynamics ([TJ 
is metastable because it takes a lot of time to leave the vicinity of a local minimum 
before jumping to the neighborhood of another local minimum. This is due to the 
energy barriers which have to be overcome. Such barriers and the zero temperature 
limit can be analyzed in particular with large deviation techniques [19|. Second, 
metastability may come from entropic effects. Imagine that the configuration space 
is made of two boxes linked by a narrow corridor (the potential V is zero on this 
configuration space, and infinite outside). Then, the dynamics ([TJ is metastable be- 
cause it takes a lot of time to find the small corridor to go from one box to the other. 
Metastability is here due to entropic barriers. This can be quantified using the no- 
tion of free energy, see [36, 29 j. In practice, metastability thus originates from a 
combination of both energetic and entropic effects, the relative importance of each 
depending on the system under consideration, and on the temperature. See Figure[T] 
for a numerical illustration. Generally speaking, metastability is related to the mul- 
timodality of the measure jj,, namely the fact that some high probability regions are 
separated by low probability regions. 
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Fig. 1 Above: (left) an example of a 2 dimensional potential, with energetic barriers (there are 
actually two possible saddle points to go from left to right), with (right) the x-component as a 
function of time of the associated stochastic process solution to Below (left) an example of a 
2 dimensional potential (which is zero inside the closed solid-line shape, and infinite outside) for 
which an entropic barrier (right) is observed. 
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An important concept related to metastability is ergodicity. Under adequate as- 
sumptions on V, the dynamics ([TJ can be shown to be ergodic with respect to the 
canonical measure: 

/i(dx)^Z- 1 exp(-pV(x))dx, (2) 

where Z = J R „ exp(— j5V (x)) dx is assumed to be finite. Ergodicity actually refers to 
two different properties: (i) an average along a trajectory converges in the long-time 
limit towards an average with respect to fi: for any test functions <p : M" — > K, 

lim ] _ [ (p(X,)dt = [ (pdfi, (3) 

T^<>= 1 Jo Jw 

and (ii) the law of the process X, at time t converges to jj. in the long-time limit: for 
any test functions <p : K" — > R, 



limE(<p(X r ))= / (pd/x. (4) 



If X, is metastable, both limits <|3j and Q are typically very difficult to reach, since T 
should be sufficiently large to visit all the metastable states. From a numerical view- 
point, metastability raises thus sampling issues, both to compute canonical averages 
(namely averages with respect to ji) and to compute averages over paths which re- 
quires to generate efficiently metastable dynamics, the latter being of course a more 
complicated task than the former. 

In the following, we would like to introduce two mathematical tools to measure 
the "degree of metastability" of the dynamics ([l}. In Section[2] we discuss the notion 
of Logarithmic Sobolev inequality, which is a way to quantify the ergodic features of 
a process, and more precisely, how fast the convergence Q happens. In this context, 
the slower the convergence, the more metastable the process is. In Section [3] we 
introduce the notion of quasi-stationary distribution, and identify the typical time 
it takes, for a given region of the state space, to reach a quasi equilibrium in this 
region before leaving it. Metastability in this case is related to the fact that this time 
is small compared to the typical time it takes to leave the region. In both cases, we 
will explain how these tools can be used to (i) analyze numerical methods which are 



used in molecular dynamics to "accelerate" metastable dynamics (see Sections 2.3 
and |3.2[ ) and (ii) obtain coarse-grained descriptions of the metastable dynamics (see 
Sections |2~4land[33|). 

We would like to emphasize the importance of quantifying metastability for 
practical aspects. Indeed, there exists in the literature many asymptotic analysis 
in some limiting regimes (zero temperature limit, time-scale separation limit en- 
forced through an explicit small parameter introduced in the dynamics, etc..) where 
it is shown how a metastable dynamics converges to some effective Markovian 
dynamics. In practice, the parameters which are considered to go to zero in the 
asymptotic regimes may indeed be small, but are certainly not zero. A natural ques- 
tion is to quantify the error introduced by assuming that these parameters are zero, 
an assumption which is behind many numerical methods, and coarse-graining ap- 
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proaches. This requires in turn quantifying the metastable features of the original 
dynamics. This is precisely the aim of both approaches presented below. 

Let us finally clearly state that the aim of this paper is not to provide proofs of 
the announced results and sometimes even not to state them very precisely math- 
ematically, but to gather in a new and hopefully enlightening way various recent 
studies, in particular ll35l l29l l28l . All the statements below can be reformulated as 
precise mathematical claims, with rigorous proofs, except the discussion in Sec- 
tion P 



3.3 which is more prospective. 



Remark 1. There are other techniques to quantify metastability, that we do not re- 
view here. We would like to mention in particular spectral approaches E4l l44ll . 
potential theoretic approaches |3 4| and approaches based on drift conditions l26ll . 
Drawing connections between these various techniques is an interesting subject, see 
for example [7| for connections between approaches based on drift conditions, and 
functional inequalities such as LSI. 

Remark 2. In this paper, we concentrate on the overdamped Langevin dynamics (FT) 
even though this is not the most widely used dynamics in molecular dynamics. All 
the algorithms we present below generalize to (and are used with) the phase-space 
Langevin dynamics, which is much more popular. However, generalizing the math- 
ematical approaches outlined below to the Langevin dynamics is not an easy task, 
due to the lack of ellipticity of the associated infinitesimal generator, see 1231 l47l 
for examples of studies in that direction. 



2 Logarithmic Sobolev inequality 



As explained above, we quantify in this section the metastability of (FT) by consid- 
ering the rate of convergence of the limit Q. We thus consider the law at time t 
of X t , which has a density \j/(t,x). The probability density function \j/ satisfies the 
Fokker-Planck equation: 

<9,y/ = div(VVy/ + /T 1 Vy/). (5) 

Notice that the density of ji with respect to the Lebesgue measure, denoted by 
i//oo(x) = Z exp(— /3V (x)), is obviously a stationary solution to (T5]). 



2.1 Definition 



Let us introduce the notion of logarithmic Sobolev inequality (see for example [ 1 1). 



Definition 1. The probability measure ji is said to satisfy a logarithmic Sobolev in- 
equality with constant R (in short LSI(R)) if and only if, for any probability measure 
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V such that v is absolutely continuous with respect to jJ,, 

ff(v|M) < ^(vlM) 

(I 
2 



where H(v\/J.) = J K „ In ( |^ j afv is the relative entropy of v with respect to /i, and 



Vln 



dv is the Fisher information of v with respect to ji. 



When both ji and v admit densities (respectively y and 0) with respect to the 
Lebesgue measure, we shall also use the notation H(y\<j>) for H(ji\v). A crucial 
property is the following: 

Proposition 1. The measure pL defined by |2} satisfies a LSI(R) if and only if, for all 
probability density functions \ffo, for all time t > 0, 

ff(y(f,-)IV~) <//(v/o|roo)exp(-2j3- 1 ^) (6) 

where \jf is the solution to |5]l with initial condition y/~(0, •) = 

This is a simple consequence of the standard computation: if y/ satisfies |5]), then 

^H(y(t,-)\y„) = -l3- 1 I( ¥ (t,-)\Y~). 

A natural way to quantify metastability if thus to relate it to R: 

The smaller R, the more metastable the dynamics ([T| is. (7) 

Before we proceed, let us make two remarks. First, by combining the classical 
Bakry-Emery criteria with the perturbation result of Holley and Stroock, the mea- 
sure ji actually satisfies a LSI under very mild assumptions on V: basically, if V is 
smooth, and is a-convex at infinity, then a LSI for jj. holds. What is more compli- 
cated is to get the optimal constant R. Second, in the simple case of a double well 
potential in dimension 1, it is standard to show that the average time spent by the 
process X, in a well before hopping to another one increases as the temperature goes 
to zero, using for example large deviation techniques. It can also be checked in this 
simple setting that the constant R scales like exp(— fiH) in the limit of large j3 (small 
temperature), where H is the height of the barrier to overcome to leave a given well. 
This prototypical situation thus shows that the LSI constant indeed allows to quan- 
tifying the intuitive definition of metastability we gave in the introduction. 



2.2 Metastability along a reaction coordinate 

Many algorithms and modelling discussions are based on the introduction of a so- 
called reaction coordinate ^ , namely a smooth low-dimensional function which typ- 
ically indices transition from one metastable state to another. For simplicity, let us 
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assume that % : M." — > T has values in the one-dimensional torus (think of an an- 
gle in a molecule, which characterizes its conformation). Then, one may introduce 
probability measures associated to ji and % : 

• the image % * [i of the measure /I by which is a probability measure on the 
torus T, and which is also written as ^ * H(dz) = exp(—f5F(z))dz, F : T — » K 
being then the so-called free energy associated to fi and Using the co-area 
formula, a formula for F is given by 

-ii„ f -7-1. 



F(z) = -J3- I ln / Z- L exp(-pv(x))8 ((x y t (dx) (8) 

JZ(z) 

where L(z) — {xG M",^(x) = z} and 8^ x )_ z (dx) is a measure supported by L(z) 
such that dx = 8^i x \ z {dx)dz. 
• the family of conditional probability measures ju(-|<!; (x) — z) with support E(z), 
which are indexed by z £ T and defined as 

**I«W=*> = exp(-/3F(z)) ' 

These two measures are completely defined through the conditioning formula: for 
any test functions: <p : W -> R and iff : T -> R, 

/ <K§(*)M*)M / <P(z) / ^(jc)M(^|§(x)=z)exp(-j8F( z ))rfz. 



For proofs, we refer for example to 

Let us assume that the measures (x) = z) satisfy a LSI(p) (with p uniform 
inz), and that the measure t, *{i = Z exp(— f5F(zj) dz satifies a LSI(r). In the spirit 
of the previous subsection, we will say that (this will be assumption (HI) below): 

"the metastability of the process X, is along ^" if and only if p ^ r. (9) 

A typical example of such a situation is given on Figure [T] for which ^ (x,y) = x. In 
such a two-dimensional setting, notice thatF(x) = — ln/ R Z _1 exp(— /3 V (x,y)) dy 

a i j mi \ \ QX V{-PV{x,y))dy 
and M (^ y )=*) = exp( _^ (x)) ■ 

It is possible to relate the LSI constant of the measure (namely R) to the LSI 
constants of the measures ji(-\^(x) = z) (namely p) and ^ * ji (namely r), see l32l . 
Roughly speaking, 

If r is small and p is small, then R is small. (10) 

Moreover, if R is small and | is well chosen, then r is small but p may be very large. 
In such a case, the metastability of the process X t is essentially encoded in the low- 
dimensional observable £,(X t ). It is then possible to use numerical and analytical 
techniques to accelerate the long-time convergence and go around the difficulties 
associated to metastability, as explained in the two next subsections. This could 
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also be used to yield a definition of what a good reaction coordinate is: it is a (low- 
dimensional) function % such that p fr is as large as possible. Designing a numerical 
method which would look for the best t, in this respect would be very interesting 
for practical applications. 

Before we proceed, we provide a formula that will be useful below. By using the 
co-area formula, starting from ([8]), it is possible to check that the derivative of F 
writes: 

f'(z) = y/(*)M(^(*)=z)with/=— ^-/rW (id 

In the simple case | (x,y) = x mentioned above, the function / is simply / = d x V. 



2.3 A first example: the adaptive biasing force technique 

As explained above, one difficulty related to metastability is that the convergence Q 
is very slow. In particular, this implies that it will be difficult to sample the canonical 
measure fi from a trajectory X t . The fact that it is difficult to sample a multimodal 
measure is a well known problem shared with other fields than molecular dynamics. 
In statistics for example, Markov Chain Monte Carlo methods are very popular and 
similar sampling problems occur for Bayesian inference iflOl . 

Under assumption a natural importance sampling idea is the following. If the 
metastability of X t is along § , it is sensible to try to remove these metastable features 
by changing the potential V to V — F o | , where F is the free energy <(8j (and F oB, 
denotes the composition of F with §). Indeed, if we denote 

ju F (dx) = Zp 1 exp(- J3 (V - F o § ) (x) ) dx 

the associated tilted measure, we clearly have E, * fif = lj(x)dx and jUf(-|^ [x) = 
z) = jU (- 1 § (jc) = z): the conditional measures remain the same, but the marginal 
along 4 is now a uniform measure, namely a very gentle measure, without any 
multimodality, and thus easy to sample. In other words, following (jTOJ, we may 
hope that the LSI constant of the tilted measure jip is much smaller than the one 
of the original measure fi. As a numerical illustration of this fact, we present on 
Figure [2] the equivalent of the trajectories presented on Figure [T] using the biased 
potential V — F o £ . We clearly observe much more transitions from left to right with 
the biased potential, both in the case of an energetic barrier and an entropic barrier. 

The difficulty is of course that computing the free energy F is a challenge in it- 
self (it corresponds to a sampling problem of, a priori, a similar complexity as the 
sampling of fi). The free energy is actually a quantity of great interest for practi- 
tioners in itself [9], and many methods have been designed to compute it [36 |. Thus, 
biasing the measure using F does not seem to be such a good idea. The principle 
of adaptive biasing technique is to actually use, at a given time t, an approximation 
F t of the free energy F (in view of the configurations visited so far) in order to bias 



8 



Tony Lelievre 




Fig. 2 See Figure^ for comparison. The reaction coordinate is £ (x,y) = x. Above: In the energetic 
barrier case, (left) the 2 dimensional potential minus the free energy and (right) the x-component 
of the associated stochastic process. Below: In the entropic barrier case, (left) the free energy 
and (right) the x-component of the stochastic process simulated again with the free energy biased 
potential. 



the dynamics. In other words, instead of using the biased dynamics (which assumes 
that F is already known) 

f dX t = -V(V-Fo£)(X t )dt + y/2fF I dW t , 12) 
\F'(z)=E^f(XM(X)=z), 

where the formula for F'(z) is exactly ( fTTj ) (here and in the following, Ep denoting 
an expectation taken with respect to the measure fi), one rather considers 



\ dX t = -W(V-F t o^)(X t )dt + ^2p^dW t , 
\F t >(z)=E(f(X t )\t;(X t )=z). 

The bottom line is that if X t solution to (jT3j was at equilibrium instantaneously with 
respect to jip t , then F' t would be exactly F'. Of course, this instantaneous equilib- 
rium assumption is wrong, but the hope is that F[ learns on the fly (and eventually 
converges to) F' . 

The dynamics ( fT3| > is the so called adaptive biasing force (ABF) process, and is 
one of the most efficient methods to compute free energy differences, see IPT31 1251 
for the original idea. One way to understand such a method is that already visited 
states are penalized (the potential is flattened in these regions, or equivalently, the 
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probability to visit them is increased) in order to force the stochastic process to visit 
new regions. There are many other techniques along these lines, see ||34l . The hope 
is that, by forcing the system to visit all the possible values of £ , the metastability 
of the original dynamics is completely overcome. This is somewhat in the spirit of 
simulated annealing or parallel tempering, where the temperature (which in some 
sense plays the role of £,) of the system is changed in order to visit new regions. 
What can be shown is that under the assumptions: 

• (HI) the metastability of the process X t is along £, (see |9])), 

• (H2) the cross derivative ^EU)f i s bounded (where Vjv z ) denotes the gradient 
projected onto the tangent space to E(z)), 

then the convergence of the ABF process ([12} to equilibrium is much faster (ba- 
sically exponential with rate y3 1 p) compared to the convergence of the original 
process ([T]) to equilibrium (which is exponential with rate f3~ 1 R, see |6]i). In partic- 
ular, F, converges to the free energy F very fast if | is such that p is large (this is 
assumption (HI)). We refer to [35] for a precise mathematical statement. The proof 
is based on entropy techniques [2 1, and the idea of two-scale analysis for logarithmic 
Sobolev inequalities Il22ll32l . 

We also refer to 0311 for some refinements in the cases when p is only large for 
some of the family of conditional probability measures jU (- 1 § (jc) = z) indexed by z 
(the so-called bi-channel situation). For practical aspects (discretization techniques 
and numerical illustrations), we refer to [34- 8, 27] and to flQl for an application of 
such techniques in the context of Bayesian inference in statistics. 

Remark 3. In the long-time limit, F is obtained, and the measure sampled by ( fT3j ) is 
not but jip. There are basically two ways to recover averages with respect to jl. 
First, as in standard importance sampling approaches, reweighting can be used: 

For this idea to be efficient, the weights should not be too widespread (otherwise 
the variance may be large), which means that supF — infF should not be too large, 
see ifTUl for a discussion of these aspects. Another idea is to use a conditioning 
approach: 

/ E M (p(X)|§(X)= z )«pH3F(z))<fe 

E M (<p(X)) = 



f exp{-pF{z))dz 



The conditional probabilities E jl (<p{X)\^(X) — z) can then be computed using ei- 
ther the fact that E M (<p(X)|^(X) = z) = E^(ip{x)\^(X) = z) (so that the ABF 
process ( fl~3~| > can be used to compute them) or using dedicated techniques to sam- 
ple the conditional probability measure jj,(-\£,(x) =z), which should be easy under 
assumption (HI). Indeed, if p is large, in virtue of Proposition [T] the overdamped 
Langevin dynamics associated to the measure jj,(-\£,(x) =z) should converge very 
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fast to equilibrium. Such a dynamics is roughly speaking a projection of the original 
gradient dynamics ([T| to the submanifold E(z). We refer to ifTTl [36l [37ll for more 
information on such constrained sampling techniques. 



2.4 A second example: obtaining an effective dynamics on t, (X t ) 

If we are in the situation (|9]l where the metastability of X t is along % , another idea is 
to try to derive an effective dynamics for % (X r ), which would then be easy to simu- 
late since it is low-dimensional, and hopefully associated with a smaller character- 
istic timescale than the original dynamics (fill. The idea is that if the metastability 
is along then £, (X t ) should move much more slowly than the components of X, 
which are along "directions orthogonal to so that some averaging should be pos- 
sible along those directions. This is very much in the spirit of projection operator or 
Mori-Zwanzig techniques l20l . Below, we first derive an effective Markovian dy- 
namics for 4 (Xt) and then assess the quality of this effective dynamics by deriving 
quantitative bounds. These quantitative bounds are again obtained using logarithmic 
Sobolev inequalities and entropy computations. 

The idea to derive an effective dynamics on % (X,) starts from a simple Ito calcu- 
lus. If (X t ) t >Q satisfies fl}, then 

dSQQ = (-W- V| +p- l At;)(X t )dt + ^^\Vt;(X t )\^^-dW t . 

Of course, this is not a closed equation for the evolution of | (X,). It is not difficult 
to check that if we the consider 



with 



and 



dz t =b(t,z t )dt + \/2l3- l d(t,zt)dB t 
b(t,z) = E f (-W • V£ +p- l AZ)(X t ) £ (X t ) - z 



a 2 ( tl z) = E^\ 2 (X t )\UX t ) = z 



then, for all time t > 0, the law of the random variable ^(X t ) is equal to the law 
of the random variable z t . The difficulty of course is that b and a are intractable 
numerically, since they are functions depending on time. A natural idea (following 
the intuition given at the beginning of this section) is that one could replace the con- 
ditional expectations defining b and a 2 by conditional expectations at equilibrium, 
namely: 

dzt=b{z t )dt + ^2^a{z,)dB t (14) 

with 

b(z)=E^[(-VV-n+l3- l A^(X) §(X) = 
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and 



a 2 (z)^E,(\^\ 2 (X)\^X)^z) 



For related approaches, we refer to lfl4l l40l l42l . 

Now that we have derived a Markovian dynamics ( fT4| i on Zt, which should be 
such that (Zt)t>o is close to (%(X t )) t >Q, a natural question is whether we can give 
some error estimate of some "distance between the two processes". What we have 
shown in [29] is that under the same assumptions (HI) and (H2) needed for the 
analysis of the longtime convergence of the ABF dynamics, the relative entropy of 
the law at time t of %(X t ) with respect to the law at time t of Zt is bounded from 
above uniformly in time by a constant divided by p 2 . Thus, the larger p (this is 
exactly (HI)), the closer these two probability measures are, for all times. We refer 
to ||29ll30l for a precise mathematical statement. The proof uses basically the same 
ingredients as for the analysis of the long-time convergence of the ABF process. 

Remark 4. The result we mention above only concerns the closeness of the marginals 
in time of the two processes, namely the laws, at times t > 0, of % (X t ) and z r . Of 
course, the stochastic processes (| (X t )) t >o and (zt)t>0 contain much more informa- 
tion than their marginal in times (think of time correlations, first time of escape from 
a well, typical paths to go from one well to another, etc.), which are of interest in 
the context of molecular dynamics simulation. A natural question is thus whether 
one can prove similar results on the law of the paths. This can indeed be done, under 
very close assumptions to (HI) and (H2), on finite time intervals. We refer to I3T1 . 



3 Quasi-stationary distribution 

In this second part, we would like to introduce another tool to study metastability, 
and to show in particular how this tool may be useful to analyze the parallel replica 
dynamics |48 1 which is a numerical method to efficiently generate a trajectory of a 
metastable dynamics. 

Let us again consider the dynamics ([TJ, and let us assume that we are given some 
partition of the state space, for example through an application 

,y : R" ->■ N 

which to a given configuration x associates the number S*(x) of the state in which x 
is. In some sense, 5? could be thought as an equivalent to the reaction coordinate | 
of the previous sections, but with discrete values. In the following, one should think 
of the states (for leN) 

ffi = {.tel",y(.i)=i} 

as the metastable regions we mentioned in the introduction (think for example of 
the basins of attraction of the local minima of V). Typically, when X, enters a state 
Wk, we would like it to stay in the state for a long-time before it leaves the state. 
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To formalize this idea, and to quantify the error introduced when this assumption is 
done in some algorithms, we rely on the notion of quasi-stationary distribution. In 
the following, we assume that the states Wk are smooth bounded connected subsets 
ofW. 



3.1 Definition and two basic properties of the quasi-stationary 
distribution 

In this section, we consider a given state for a fixed index k, and we denote by 
W the interior of Wf,- Let us introduce the notion of the quasi-stationary distribution 
associated to the well W. We refer to @ ETJ [39] El 06] [43] [16] [HI Q2) for general 
introductions to the quasi-stationary distribution. 

Definition 2. The quasi-stationary distribution (QSD) associated to the dynam- 
ics ([TJ) and the state W is defined as a measure v with support in W and such that 

Vf > 0, VA c W, 



F(X?€A,t<T$)v(dx) 



v(A) 

/ F(t<T w )v(dx) 
Jw 

where X, denotes the solution to ([TJ such that Xq = x and T w = inf{f > 0,^ ^ W}. 



In other words, if Xq is distributed according to v and if (X s ) s >q solution to ([T| has 
not left the state W on the interval [0,t], then X t is also distributed according to v. 

The QSD enjoys two other properties which are very important in practice, given 
in the two following propositions. To present these two properties, we need to 
state an intermediate result. Let L = — VV • V + j3 _1 A be the infinitesimal gener- 
ator associated to ([TJ. Then the density u\ of v with respect to fi (namely such that 
v(dx) — u\(x)n{dx)) is the first eigenfunction of L with zero Dirichlet boundary 
condition on dW (and with normalization f w u\dji — 1). In other words: 

f \Vu\ 2 dn 

Ml e argmin (15) 

ueHl Q {W) / u 2 d\i 
Jw 

and 

{Lu\ — — Xi u\ in W, 
i (16) 
«i =0ondW. 

Here H* fl (W) = {u : W ->R, J w (|Vm| 2 +m 2 ) dji < o° and u = on dW}. The op- 
erator L can be shown to be negative, self-adjoint on L 2 , and with a discrete spec- 
trum ( — X\ , — A2 , • • • , — A„ , . . . ) (the eigenvalues are in decreasing order and counted 
with multiplicity) and associated eigenfunctions [u\ ,i<2, . . . ,u n , . . .) in (W). The 
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first eigenstate can be shown to be non-degenerate (A2 > Ai > 0), and the first eigen- 
function does not vanish on W, and can thus be assumed to be positive {u\ > on 
W). Using this spectral decomposition of the infinitesimal generator, and this char- 
acterization of the QSD, one can show the following results. 

Proposition 2. Let Xq be distributed according to a distribution with support in W, 
and let us consider (X t ) t >Q solution to Q. Then, the law ofX t conditionally to the 
fact that the process remained in W up to time t converges to the QSD V: for any 
test function (p : W — > M, 

limE((p(X t )\t<T w )= f <pdv. 

Jw 

Moreover, the rate of convergence is exponential: 
sup 

(peL-(w),\\<p\\ L ~, {w) <i 

where, as explained above, —X\ > — A2 are the two first eigenvalues of the operator 
L = — W • V + j3~'zi considered on W with zero Dirichlet boundary conditions on 
dW. 

The proof of this proposition is based on a spectral decomposition of the solution 
to the associated Feynman-Kac partial differential equation, and the simplest proof 
actually requires the law of Xq to have a density with respect to which is in L^. 

This Proposition has two consequences. First, if the process X t enters a state and 
stays sufficiently long in the state, then its marginal in time is close to the QSD. Sec- 
ond, one could also think of considering the following interacting particle system, 
which samples the law of X, conditionally to the fact that t < T\y: 

• Consider initial conditions (Xq)i<„<at distributed independently according to 
a given law with support in W; 

• Let them evolve according to ([TJ, driven by independent Brownian motions; 

• Each time a replica leaves the state, it is killed, another replica is duplicated and 
the new walker then evolves again independently of the others. 

This is the so-called Flemming-Viot process 151 |2T1 [1711381 . In the limit of infinitely 
many replicas, the replicas are distributed according to the law of X t conditionally 
to the fact that t < 7V, so that, in the long-time limit, the replicas are distributed 
according to the QSD (according to Proposition|2]i. 

Finally, another crucial property of the QSD is the following: 

Proposition 3. Let Xq be distributed according to the QSD, and let us consider 
(Xt)t>o solution to ([TJ. Let us recall that Tw — inf{f > 0,X t ^ W}. Then, 

• The law of the exit time Tw is exponential with parameter X\; 

( 1 du\ \ 

• The law of the exit point Xj w is — — ^ — exp(— pV ) dAg w where n denotes 

V PM " n J 
the unit outward normal to W and Xg w is the Lebesgue measure on dW; 



E(<p(X,)\t <T W )- <pdv 



<Cexp(-(A2-Ai», (17) 
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• The two random variables T\y and Xj w are independent. 

For proofs of these well-known properties of the QSD in our precise setting, we 
refer to l28l . 

In this context, one could state that the region W is a metastable state for the 
dynamics ([1} if the typical time it takes to leave W is large compared to the typical 
time it takes to reach the QSD (which is 1 / (A2 — Ai ) according to (fT7]>), namely: 

The probability P [7^ < - — ) is close to zero. (18) 

\ A2 — M ) 

In some sense, this characterization ([18) of metastability is the counterpart, in terms 
of QSD, of the two characterizations |7]) and |9) we introduced above in terms of 
LSI. The difficulty to completely formalize ( fT8] l is that the law of the initial condi- 
tion Xq e W should be defined to make precise the law of Tw- For example, if Xq 
is distributed according to the QSD, Tw is exponential with parameter Ai and thus 

f(Tw < a 2 -Ai ) = * ~ ex P(~ ^1/(^2 — Ai)) so that W is metastable if Ai <cA2— Ai. 
As in the LSI case where |9]) could be used to define a good reaction coordinate, the 
characterization ( fT8~| of metastability in terms of the QSD could be useful to define 
what a good partition of the configurational space (namely a good function S^) is. 



3.2 A first application: analysis of the parallel replica dynamics 

The notion of QSD can be used to analyze an algorithm called the parallel replica 
dynamics which has been introduced by A.F Voter in |48|, and which is based on 
some Markovianity assumption, as will become clear below. The QSD is a way to 
quantify the error introduced by this Markovianity assumption, and to explain in 
which context the algorithm is efficient. The aim of the parallel replica dynamics is 
to generate very efficiently a process (S t )t>o with values in N, and which is close 
to (S^(Xt))t>o- Indeed, in many cases, one is not interested in the details of the dy- 
namics of X t : only the hopping events from one state to another are of interest. One 
important requirement is that the two stochastic processes (S t ) t >o and (=5^pQ)) ; >o 
should be close in terms of the law of the trajectories (not only the time marginals, 
for example, see also Remark[4]above). 

Let us first describe the algorithm. This is a three stage algorithm. In the decor- 
relation step a reference walker evolves according to dlj, up to a time it stayed suf- 
ficiently long in the same state. More precisely, a time denoted T corr is introduced, 
and one proceeds to the next stage at a time fo if and only if S^(X t ) = ko is constant 
over the time interval [fo — T COJT ,?o]. During all this step, S t is by definition ,^{X t ) 
so that no error is introduced. Let us assume that the decorrelation step has been 
successful, and let us proceed to the dephasing step. It consists in introducing N 
replicas of the reference walker in the state WL and to let them evolve sufficiently 
long, conditionally to the fact that they do not leave the state, and to retain their 
final position. In other words, the dephasing step is basically a realization of the 
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Fleming- Viot process introduced in the previous section. During this stage which is 
of course done in parallel, the process S t is not evolved. Finally, the speed up comes 
from the last step called the parallel step. It consists in letting all the walkers evolve 
independently from the initial conditions obtained in the previous dephasing step. 
Then the first escape event is detected, namely 

no = argmin {7^} 

«e{i N] 

where 7™ is the first escape time from W for the n-th replica. Then, S t = ko over 
the time interval [to, to + NT™], and one proceeds to a new decorrelation step, the 
reference walker starting from the exit point X"° . The speed-up of course comes 

from the fact we consider only the first escape event among the N walkers. As will 
become clear below, this event occurs in a time N times smaller than the time it 
would take for a single walker to leave the state. 

Let us now discuss the error analysis of this procedure. A first remark is that at 
the end of the decorrelation step, if T CO n- has been chosen sufficiently large (namely 

icon- ^ i i , according to (fT7))), it is reasonable to assume that X tQ is approxi- 

M — M 

mately distributed according to the QSD. Thus, according to Proposition[3] the time 
it still remains to go out of the state WL is exponentially distributed, and independent 
of the exit point. Then, the aim of the dephasing step is clear: one wants to obtain 
N initial conditions independently and identically distributed according to the QSD. 
The parallel step is thus fully justified. Indeed, concerning the time spent in the state 
ko, since (7^, . . . , T$) are N i.i.d. exponential random variables, 7^° = min„ 7^ is 
also an exponential random variable and NT^, has the same law as 7^: this explains 
why the simulation clock is advanced by the amount of time NT^ Q at the end of the 
parallel step. Moreover, concerning the exit point, since the exit time and the exit 
point are independent random variables when starting from the QSD, considering 
X"?, as the exit point is correct in terms of distribution (it has the same law as XL ). 

In summary, the crucial parameter is T colT , which is used in the decorrelation 
step. The error which is made by one iteration of the algorithm can be formalized 
by considering: 

e{t)= sup \E(f(T w -t,X Tw )\T w >t)-E v (f(T w ,X Tw ))\, 

/:R + xdW->K,||/|| i ~<l 

where E v here denotes an expectation over functionals of X,, the initial condition Xq 
being distributed according to v. In words, e(T CO rr) measures the difference of what 
would have been the law of the couple of random variables (exit time, exit point) 
if the simulation of the reference walker would have been continued, compared to 
the law of the same couple of random variables, if we assume that the reference 
walker is distributed according to the QSD. A slight adaptation of Proposition [2] 
above shows that <?(?) < Cexp(— (A2 — X\)t) so that T CO n- should be chosen larger 
than 1 / (A2 — Ai ). The constant C here depends on the law of the initial condition 
in the state. On the other hand, T CO n- should be chosen smaller than the typical time 
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it takes to leave the state, in order for the decorrelation step to have a chance to be 
successful. With these two requirements on T C01T , it thus becomes clear that this algo- 
rithm is efficient if most of the states are metastable, in the sense of ( fT8| ), otherwise 
the decorrelation step will never be successful. 

In conclusion, the interest of the QSD in this context is twofold: (i) it enables to 
understand how large T corr should be in order not to introduce too much error in one 
iteration of the algorithm and (ii) it helps to define the assumptions required for the 
algorithm to be efficient. 



3.3 A second application: going from continuous state space 
dynamics to kinetic Monte Carlo models 

The notion of QSD could also be useful in order to formalize the construction of 
discrete state space Markov models (so called kinetic Monte Carlo models |49l in 
the context of molecular dynamics) from continuous state space Markov models 
such as ([T}. Let us recall that a stochastic process S t with values in N is Markovian 
if and only if: (Ml) the list of visited states (forgetting about time) is Markovian and 
(M2) once S t takes a new value (it enters a new state), the time it takes to leave this 
state is exponentially distributed and independent of the next visited state. 

There are basically two approaches to build such a connection. In the so-called 
milestoning approach ll45l[T5ll . one considers some disjoint subsets (the milestones) 
of the configuration space (think of small balls around the local minima of the po- 
tential V) and one considers the last milestone visited by X t . The interest of this ap- 
proach is that in the limit of very small subsets, the first requirement above (Ml) is 
naturally satisfied. On the other hand, satisfying (M2) is more involved, and requires 
some assumptions related to the metastability of the process. Here metastability ba- 
sically means that the time spent outside the milestones is very small compared to 
the time spent in the milestones, see EH). The main drawback of this approach is 
that if the milestones are too small, the process X, spends a significant time outside 
of the milestones, so that the stochastic process built as "the last visited milestone" 
may not be a sufficiently fine coarse-grained description of the original process X t , 
in order to extract useful macroscopic information (change of conformation of a 
molecular system, for example). Of course, this depends a lot on the system at hand. 

The second natural approach which has been followed in the previous section and 
by many authors 11261 l44l |49l is to consider a full partition of the state space, and 
to consider at a given time f, in which state the process is. In the previous notation, 
it thus consists in considering Again, the process y(X t ) has no reason to 

be Markovian. The approximation suggested by the approach outlined above is to 
introduce a Markov process S t as follows: using the notation of the previous section, 
when S t jumps to a new value Icq (one can imagine that the underlying process X, 
just enters a new state Wk), the time it takes to leave the value ko is exponentially 
distributed with parameter X\, and, independently, the next visited state is drawn 
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according to the exit point distribution I — — — = — exp(— j3V) ) dXj w . Here, we 

V PM on J 
used the notation of the previous section: in particular, [X\,u\) are the first eigen- 
value and eigenfunction of the operator L on W, with zero Dirichlet boundary con- 
dition on dW. This procedure would be exact if, as soon as X t entered a new state, 
it would immediately be distributed according to the QSD. The error introduced 
by this coarse-grained description is thus related to the metastability of the original 
process, namely to the fact that when it enters a new state, it reaches the QSD before 
leaving the state. Contrary to the previous results presented in this paper, we have 
not yet fully formalized these ideas from a mathematical viewpoint. Notice that in 
the approach we propose here, the kernel of the approximating Markov process is 
computed using the QSD as an initial distribution in a given state, in contrast to what 
can be found usually in the literature, namely starting from the canonical measure 
jj. restricted to the state. The interest of starting with the QSD is that the underlying 
assumptions ruling a Markov process (see (Ml) and (M2) above) are automatically 
satisfied. 
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